Cognitive Modeling with Context Sensitive Reinforcement Learning

نویسندگان

  • Christian Balkenius
  • Stefan Winberg
چکیده

A reinforcement learning system is typically described as a black box which receives two types of input, the current state, S, and the current reinforcement, R. From these two inputs, the system has to figure out a policy that determines what action to perform in each state to maximize the received reinforcement in the future (Sutton & Barto, 1998). The future expected reinforcement can be estimated either by using the sum of all future reinforcement or with an exponentially decaying time horizon. It is also possible to only take into account the reinforcement received at the next goal action which results in finite horizon algorithms (e. g. Balkenius & Morén, 1999). Learning is viewed as the formation of assocations between states and actions and are represented by numerical values that are changed during learning. In most basic reinforcement learning algorithm, the policy for each state is learned individually without regard for the similarity between different states. It would obviously be valuable if actions learned in one state could be generalized to other similar states. Such generalization can be introduced into a reinforcement learning algorithm in several ways. One possibility is to code the similarity between states by similar state vectors. Such methods have been proposed by Sutton (1996), who used a tile representation or the underlying state space and Balkenius (1996), who used a multi-resolution representation. As alternative is to learn the underlying state representation during exploration based on the closeness of different states (Dayan, 1993). In both cases, learning becomes faster since each learning instance will be generalized to many similar states. In many cases, it makes sense to divide the state input into two parts, one that code for the situation or context and one that codes for the part of the state that controls the action (cf. Balkenius & Hulth, 1999, Houghes & Drogoul, 2001). If such a combined representation is used together with the reinforcement algorithms described above, learning will generalize not only to similar states but also to similar contexts. The role of state and context will thus be symmetric.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Translating a Reinforcement Learning Task into a Computational Psychiatry Assay: Challenges and Strategies

Computational psychiatry applies advances from computational neuroscience to psychiatric disorders. A core aim is to develop tasks and modeling approaches that can advance clinical science. Special interest has centered on reinforcement learning (RL) tasks and models. However, laboratory tasks in general often have psychometric weaknesses and RL tasks pose special challenges. These challenges m...

متن کامل

When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other interval of task performance), what (the object...

متن کامل

Cognitive Modeling of Action Selection Learning

Our goal is to develop a hybrid cognitive model of how humans acquire skills on complex cognitive tasks. We are pursuing this goal by designing hybrid computational architectures for the NRL Navigation task, which requires competent sensorimotor coordination. In this paper, we describe results of directly tting human execution data on this task. We next present and then empirically compare two ...

متن کامل

Social stress reactivity alters reward and punishment learning.

To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punis...

متن کامل

Cognitive flexibility in adolescence: Neural and behavioral mechanisms of reward prediction error processing in adaptive decision making during development

Adolescence is associated with quickly changing environmental demands which require excellent adaptive skills and high cognitive flexibility. Feedback-guided adaptive learning and cognitive flexibility are driven by reward prediction error (RPE) signals, which indicate the accuracy of expectations and can be estimated using computational models. Despite the importance of cognitive flexibility d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004